Ligand recognition mechanism of the human relaxin family peptide receptor 4 (RXFP4)

Members of the insulin superfamily regulate pleiotropic biological processes through two types of target-specific but structurally conserved peptides, insulin/insulin-like growth factors and relaxin/insulin-like peptides. The latter bind to the human relaxin family peptide receptors (RXFPs). Here, we report three cryo-electron microscopy structures of RXFP4–Gi protein complexes in the presence of the endogenous ligand insulin-like peptide 5 (INSL5) or one of the two small molecule agonists, compound 4 and DC591053. The B chain of INSL5 adopts a single α-helix that penetrates into the orthosteric pocket, while the A chain sits above the orthosteric pocket, revealing a peptide-binding mode previously unknown. Together with mutagenesis and functional analyses, the key determinants responsible for the peptidomimetic agonism and subtype selectivity were identified. Our findings not only provide insights into ligand recognition and subtype selectivity among class A G protein-coupled receptors, but also expand the knowledge of signaling mechanisms in the insulin superfamily.

1. Docking a known structure on a new cryo-EM map could be advantageous for a first approximation on solving unknown structures. In the model building and refinement section authors mention the structures used as initial model for solving RXFP4 and the G-protein (lines 521-522). Could authors explain why they chose such structure for implementation? Are those structures close related to the RXFP4? Is there any justification in terms of the structure, sequence similarity, structure resolution or any other comparable feature? Could the authors propose a strategy for designing a robust methodology for solving membrane proteins from cry-EM data in case no similar structures are available? 2. In line 524, could the authors provide more details for using restraints on the ligand coordinates? Could authors explain how the ligands coordinates were assigned? What kind of restraints were imposed, positional, dihedral, distances, etc.? 3. In line 526, What does it mean manual adjustment of the model and rebuilding? Is there any mathematical algorithm involved in the structure fitting in the density map? 4. What algorithm is implemented in the real space refinement in the PHENIX software? 5. For the structures C4 and DC591053 bound to RXFP4-Gi, the initial structure of the complex INSL5-RXFP4-Gi was used (line 523). May the authors clarify the modeling process of compound 4 and DC591053, and how the atom positions were assigned for the synthetic agonists?
In the molecular dynamics simulation section authors describe a setup for the INSL5-RXFP4 complex in a POPC lipid bilayer. 1. May the authors provide more context on the purpose of the MD simulation study? Was it designed to test the consistency of the interactions found in the resolved structure against those found in a MD all-atom Force Field calculation? 2. From figures 8 and 9 in Suppl-info, there is a shift of the INSL5 in the receptor binding pocket. As the receptor binding pocket seems well preserved (Suppl Figure 8c) What interactions produce the Cterminus motion of W24? Did authors consider water mediated interactions or hydration level in the interhelical region? Reports of internal water molecules in crystal structures in GPCRs are known in literature. 3. Authors may include more details of the simulation protocol for reproducibility: I. system setup, II. equilibration and production protocols III analysis for data collection. Calculation of the binding free energy could be relevant to demonstrate selectivity of DC591053 in the RXFP4-Gi complex. Indeed, in line 183 of the main text authors mention that "C-terminal a-helix of the B-chain could stably insert into the orthosteric pocket thought its tip residues". This statement is misleading as the starting configuration included already the INSL5 in the binding pocket. A description of the insertion mechanism would require the calculation of the transition from unbound to bound states using, for example, umbrella sampling.
In section Expression and purification of the RXFP4-Gi complex describes the procedure to produce the recombinant receptors in insect cells. May the authors briefly describe the procedure to produce the receptor mutants?
For structure validation it is recommended to report the Rama-Z score (Structure 28, 1249-1258.e1-e2, November 3, 2020, which provides a criterion for improbable backbone geometry |Z|>3, 2<|Z|<3 possible geometry, and |Z|<2 for normal geometry. In Suppl- Table 1, the favorable, allowed and outliers was reported as a percentage, which is not a definitive criterion for a good shape of the Ramachandran angles distribution. On lines 51-52 authors describe differences in the N-terminal tail, between RXFP1-2 and RXFP3-4, and mention 43% of sequence identity. Could authors clarify whether the sequence identity refers to the TM domains, the N-terminal, or was it for the overall structure?  relaxin-3. The residues exchanged from relaxin-3 to INSL5 are shown as red sticks and labelled. The length of B chain C-terminal α-helix of INSL5 is longer than that of relaxin-3, mainly because of the adjacent two glycines (-CGGSRW) in relaxin-3 that is absent in INSL5 (-CASSRW).
To further address this point, we performed additional alanine mutation and amino acid switching experiments in the equivalent positions of TMs 3, 5 and 7 between RXFP4 and RXFP3 (around the orthosteric binding pocket). As shown in Figure X2, INSL5 was totally inactive in RXFP4 single mutants L118 3.29 S and L118 3.29 A as well as double mutants L118 3.29 S+V122 3.33 S and L118 3.29 A+V122 3.33 A, where relaxin-3 retained partial activity although the curves shifted to the right (by 3.2-fold, 5.4-fold, 14.8-fold and 9.7-fold, respectively). For comparison, relaxin-3 activated S159 3.29 A, S159 3.29 L, S159 3.29 L+S163 3.33 V and S159 3.29 A+S163 3.33 A in RXFP3 albeit with reduced potencies. T295 7.39 V did not destroy the response of RXFP4 to INSL5 and relaxin-3, but V375 7.39 T in RXFP3 both impaired the potency (by 6.1-fold) and Emax (66.5% of the wild-type, WT) of relaxin-3 (INSL5 was inactive in WT and all the four RXFP3 mutants). Therefore, S159 3.29 , S163 3.33 and V375 7.39 in RXFP3 and L118 3.29 , V122 3.33 and T295 7.39 in RXFP4 are likely involved in RXFP3 vs.
RXFP4 subtype selectivity, consistent with the observations in RXFP3/RXFP4 chimeric receptor studies (PMID: 18582868). Clearly, it will be helpful to further elucidate such a selectivity when a cryo-EM structure of RXFP3 is available.
2 / 13 Figure X2. Key residues likely involved in receptor subtype selectivity. a, Binding mode of INSL5 (green) with RXFP4 (orange) in the cryo-EM structure. L118 3.29 , V122 3.33 and T295 7.39 (marked red) probably contribute to INSL5 and relaxin-3 selectivity between RXFP4 and RXFP3. b-c, Effects of INSL5 and relaxin-3 on cAMP accumulation in wild-type (WT) and mutant RXFP4. d, Effects of relaxin-3 on cAMP accumulation in WT and mutant RXFP3. Data are shown as means ± S.E.M. of at least three independent experiments. max, maximum response.

Experimental details of screening for discovering the lead compound DS591053 and INSL5 refolding should be provided.
Response: Thanks for the comment. We screened our in-house tetrahydroisoquinoline library to discover novel RXFP4 agonists using the cAMP accumulation assay. As a result, six compounds were found to display potent RXFP4 agonist activities ( Figure X3) with DC591053 being the best (pEC50 = 7.24 ± 0.12, n = 3 as measured in stably-transfected CHO-K1 cells). Importantly, DC591053 neither reacted with RXFP3 nor parental CHO-K1 cells. The screening studies will be summarized in another manuscript currently in preparation. To reflect this, we have revised the manuscript: "We screened our in-house tetrahydroisoquinoline library aimed at discovering novel RXFP4 agonists using cAMP accumulation assay.
At the end of fermentation, the biomass was harvested and the inclusion body was recovered for refoldingsolubilized in 8 M urea solution and reduced by β-mercaptoethanol for 2 h. The reduced precursor was then refolded overnight,. The Response: We thank the reviewer for the valuable suggestion. To further optimize the density of the receptor and ligands, we rerun particle picking and tried several rounds of local refinements with different parameters as well as DeepEMhancer (PMID: 34267316), but failed to improve. The poor electron densities of the INSL5 A chain and morpholine ring of DC591053 were probably due to their relative flexibilities and weak binding affinities. As the reviewer suggested, the near-atomic resolution models of the three ligands in the cryo-EM density maps have now been moved to Figure 1.
Meanwhile, the method part was expanded accordingly: "After the last round of refinement, the final map has an indicated global resolution of 2.75 Å at a FSC of 0.143. It was subsequently optimized using DeepEMhancer 73 before model building." To reflect the limitation of our structures, the following statement has been added to the discussion: "In this study, we present three Gi-bound RXFP4 structures in complex with its endogenous ligand INSL5, RXFP3/RXFP4 dual agonist compound 4 and RXFP4-specific agonist DC591053. Because of the high flexibility and the relatively weak binding affinities, the INSL5 A chain and morpholine ring of DC591053 showed low-resolution features compared with other regions of the ligands."

The authors should analyze the expression level of RXFP4 mutants, which can affect the interpretation of the results especially for those with loss of response.
Response: We thank the reviewer for the suggestion. Detailed information of the mutants and their surface expression levels have been included in the revised Supplementary Table 4 (Table X1 below). As mentioned in the manuscript, the RXFP4 mutants T121A and H299A displayed significantly lower response to the three ligands, but their expression levels were between 40.11% and 84.87% of the WT. Three RXFP4 mutants (R208A, W97A and E100A) could be activated by at least one of the three ligands, thus the loss of response to other two ligands appears not entirely associated with surface expression. Cell surface expression was measured by flow cytometry. Values were normalized to the wild-type in HEK293T cells.
Minor points: 1. The typos and grammar errors should be corrected throughout the paper.
Response: These points are well taken, thanks.
-Lane 219: replace "significant challenge" by "very challenging" Response: This point is well taken. We have revised the relevant sentence as: "Since the sequence identity of the ligandbinding pocket between RXFP3 and RXFP4 is 86.36%, development of receptor subtype-selective ligands is significant challenge very challenging."

-Lane 51 "with a relatively short N-terminal tail rather than LRR"
Response: This point is well taken. We have revised the relevant sentence as: "RXFP3 and RXFP4 have distinct binding properties with a relatively short N-terminal tails rather than LRR."

The authors should mention why developing selective agonists for RXFP4 is very important in the introduction. Is this
Response: We thank the reviewer for the valuable comments. In vivo, the overlapping expression pattern between RXFP4 and RXFP3 (PMIDs: 36184065; 27774604) as well as the related physiological properties following their activation were reported, including the influences on food intake, body weight, energy rebalance and feeding behavior. However, the precise roles of RXFP3 and RXFP4 in these processes are still unclear, because most available ligands all have in vitro cross-reactivity between RXFP3 and RXFP4. Thus, a subtype specific agonist will be helpful to distinguish these two receptor subtypes. We have added the following statements to the introduction: "In addition to peptidic analogues, small molecule modulators have been reported in recent years. Compound 4, an amidino hydrazone-based scaffold identified by Novartis, is an RXFP3/RXFP4 dual agonist 18 . Because high cross-reactivity, it cannot be used therapeutically. In vivo, the overlapping expression pattern between RXFP4 and RXFP3 as well as their distinct physiological properties 19,20 call for subtype specific agonists which will likely be valuable to different clinical applications. However, selective RXFP4 agonists discovered via high-throughput screening campaigns and follow-up structural modifications displayed deficiencies in solubility, potency and toxicity 21,22 ." 3. The paragraph "Characterization of DC591053" can be moved to the method section.
Response: Thanks for the suggestion. We have significantly revised the manuscript by moving some chemistry details to the method section.

Some characters in main and Supplementary Figures are too small. The BW numbers in Figs are barely visible.
Response: This point is well taken and all related figures have been revised accordingly.

Reviewer #2 (Remarks to the Author):
The of them in terms of GMQE, the sequence identity to the target and experimental method used to obtain the structure (cryo-EM structures of G protein-bound fully active receptor conformation templates are preferred), the cryo-EM structure of the type 2 bradykinin receptor in complex with the bradykinin (PDB code: 7F2O) was chosen as the initial model template of RXFP4 with the highest GMQE score (0.56) and good sequence identity (25.94%). As far as G protein is concerned, we used the G protein construct identical to previously described the A1R-Gi cryo-EM structure (PDB code: 6D9H), for model building. The corresponding sentence in the manuscript has been revised: "According to the expected quality of the resulting models using SWISS-MODEL (https://swissmodel.expasy.org/interactive) with the quality estimated by Global Model Quality Estimate (GMQE) 74 , the cryo-EM structure of bradykinin-B2R complex (PDB code: 7F2O) 29 was used as the initial model of RXFP4 and scFv16, while the cryo-EM structure of A1R-Gi complex (PDB code: 6D9H) 71 was used to generate the initial model of G proteins." The methodology development toward robust solving membrane protein structure model from cryo-EM data without reference structure is one of the fundamental and important tasks for computational biologists. Impressively, machinelearning technology has joined this effort which brings many promising tools such as DeepTracer (PMID: 33361332), CryoDRGN (PMID: 33542510) and SAUA-FFR (PMID: 34142833). Response: Thanks for the comment. In the model building process, the initial template was rigidly fitted to the electron density maps using local optimization algorithm in UCSF Chimera v1.13.1. Then, based on electron density, the manual 7 / 13 adjustment of model and rebuilding were performed for these residues of poor density or geometry in COOT 0.9.4.1 (PMIDs: 20383002 and 15572765). Such actions were taken primarily by means of the real-space refinement engine, which handles the refinement of the atomic model against an electron-density map and the regularization of the atomic model against geometric restraint. Based on the comparison of a model against electron density and comprehensive geometrical checks for protein structures from validation tools, we could further optimize the model interactively through other tools including "Regularize", "Rigid-body fit", "Rotate/translate", "Rotamer" and "Torsion editing" as implanted in COOT 0.9.4.1.

What algorithm is implemented in the real space refinement in the PHENIX software?
Response: Thanks for the question. As descried in both literature (PMIDs: 29872004 and 30198894) and online documentation (https://phenix-online.org/documentation/reference/real_space_refine.html; https://phenixonline.org/documentation/overviews/cryo-em-real-space-refinement.html), there are multiple algorithms involved in the real space refinement (phenix.real_space_refine) in the PHENIX. Basically, the real space refinement tool aims at obtaining a model that fits the map as good as possible while possessing a meaningful geometry (no validation outliers, such as Ramachandran plot or rotamer outliers). A target function guides the refinement by linking the model parameters to the experimental data and by scoring the model-versus-data fit. For cryo-EM data, refinement of the model is done in real space and the target function is formulated in terms of a three-dimensional map. Because there are generally too many model parameters, refinement requires additional restraints that modify the target function by creating relationships between independent parameters.
Specifically, the real space refinement tool begins by reading a model file, map data and other parameters such as resolution information and/or additional restraint for ligands. Then, the tool proceeds to calculations that constitute a set of tasks repeated multiple times (macro-cycles). Tasks to be performed during the refinement that combine several algorithms including gradient-driven minimization of the entire model, simulated annealing, morphing, rigid-body refinement and local grid search. With the help of PHENIX comprehensive validation program making extensive use of the MolProbity validation algorithms, we could perform further optimization based on a detailed report on model quality and model-to-data fit (PMIDs: 29872004 and 31588918).

In the molecular dynamics simulation section authors describe a setup for the INSL5-RXFP4 complex in a POPC lipid
bilayer.

May the authors provide more context on the purpose of the MD simulation study? Was it designed to test the consistency of the interactions found in the resolved structure against those found in a MD all-atom Force Field calculation?
Response: We thank the reviewer for this important concern. Molecular dynamics (MD) simulation can provide a unique insight into the dynamic properties of GPCRs in a way that is complementary to many experimental approaches (PMID: 29188561). Herein, exactly as the reviewer pointed out, the MD simulation of INSL5-RXFP4 was performed to examine 8 / 13 both the overall stability of the INSL5-RXFP4 complex as well as their detailed residue-level interactions, thereby highlighting the key interactions that maintain peptide recognition (Supplementary Figure 8). Besides, to explore the structural stability of peptide, we performed MD simulations of INSL5 (including both A and B chains) and its B chain only (Supplementary Figure 9). The MD simulations were repeated independently three times with similar results.  Fig. 8a-f). Notably, the internal water molecules were found to fill the orthosteric pocket with the formation of multiple contacts with surrounding polar residues in both RXFP4 and the C terminus of INSL5 B chain during MD simulations (Supplementary Fig. 8g, h) as seen in other GPCRs 36-38 ." Response: We totally agreed with the above comments. To increase the reproducibility, the detail MD simulations steps were added to the manuscript as Supplementary Table 6, while the staring configuration of the MD simulations generated by the CHARMM-GUI webserver and all necessary input files have been uploaded to the submission system as a supporting file named "MDinputs-RXFP4.zip".

Supplementary
As the reviewer pointed out, the starting configuration of MD simulation is built on the cryo-EM structure, where INSL5 was already inserted into the binding pocket. The MD simulations verified that the binding between INSL5 and RXFP4 is stable with the formation of multiple polar contacts. To avoid potential misunderstanding, the corresponding statement has been revised as "Consistently, molecular dynamics (MD) simulations found that the C-terminal α-helix of the B chain could stably maintain its insertion into the orthosteric pocket through its tip residues, evidenced by the interface area and representative minimum distances (R13 B -D104 2.67 , R23 B -E100 2.63 and W24 B -Q205 5.39 /R208 5.42 ) ( Supplementary Fig. 8a-f)." Position restrain (20 kJ•mol -1 •Å -2 ) for the sidechain non-hydrogen atoms of protein and peptide;
Step6.6 2 fs 10 ns Position harmonic restrain (0.5 kJ•mol -1 •Å -2 ) for the backbone non-hydrogen atoms of protein and peptide; Step7 2 fs 1000 ns Restrain-free  indicate that these sites may play important roles in subtype selectivity." be reported, i.e., picking only water molecules forming the first "solvation" shell. From such analysis it is possible to detect persistent or conserved water-W24 interactions.

Response:
We appreciate the reviewer's valuable comment. We plotted the time evolution of the number of water molecules within the cut-off distances (2.0 Å, 2.5 Å, 3.0 Å, 3.5 Å, and 4.0 Å) of W24 B during molecular dynamics (MD) simulation ( Figure X1). These water molecules whose oxygen atoms located within the cut-off distance of at least one heavy atom in the W24 B were counted. The average numbers of water molecules within 2.0 Å, 2.5 Å, 3.0 Å, 3.5 Å, and 4.0 Å of W24 B during MD simulation are 0, 0.06, 4.56, 6.29, and 8.42, respectively. To reflect this, the Supplementary Figure 8 has been significantly expanded by adding these two panels ( Figure X1) to reveal these conserved water-W24 interactions during MD simulation. Only these water molecules whose oxygen atoms located within the cut-off distance of at least one heavy atom in the W24 B were counted. Five cut-off distances (2.0 Å, 2.5 Å, 3.0 Å, 3.5 Å, and 4.0 Å) were adopted. The MD simulations were repeated independently three times with similar results. Table 6 shows the steps for pre-equilibration of the system. Is there any plot that authors may show to determine the system stability? Often the box dimensions over time helps to determine stability of the simulation box, the XY-area and Z-height; or rmsd of the Ca atoms of the protein reaching a stable average over time.

Supplementary
Response: Thanks for the comment. Per the reviewer's suggestion, we calculated the Z-axis height and the XY plane area of the simulation box, the potential energy of the MD simulation system, the radius of gyration (RXFP4) and root mean squared deviation (RMSD) of Cα positions of the RXFP4 during MD simulation as shown in Figure X2. These results demonstrate the high stability of MD simulation system: the MD simulation box was well maintained during MD simulation, and the simulated protein RXFP4 was well-equilibrated at the desired temperature and pressure. To reflect this, the Supplementary Figure   8 has been significantly expanded by adding these panels ( Figure X2) to highlight the system stability during MD simulation. using the Cα atoms, respectively.